Feature Selection for Fluency Ranking

نویسنده

  • Daniël de Kok
چکیده

Fluency rankers are used in modern sentence generation systems to pick sentences that are not just grammatical, but also fluent. It has been shown that feature-based models, such as maximum entropy models, work well for this task. Since maximum entropy models allow for incorporation of arbitrary real-valued features, it is often attractive to create very general feature templates, that create a huge number of features. To select the most discriminative features, feature selection can be applied. In this paper we compare three feature selection methods: frequency-based selection, a generalization of maximum entropy feature selection for ranking tasks with realvalued features, and a new selection method based on feature value correlation. We show that the often-used frequency-based selection performs badly compared to maximum entropy feature selection, and that models with a few hundred well-picked features are competitive to models with no feature selection applied. In the experiments described in this paper, we compressed a model of approximately 490.000 features to 1.000 features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative features in reversible stochastic attribute-value grammars

Reversible stochastic attribute-value grammars (de Kok et al., 2011) use one model for parse disambiguation and fluency ranking. Such a model encodes preferences with respect to syntax, fluency, and appropriateness of logical forms, as weighted features. Reversible models are built on the premise that syntactic preferences are shared between parse disambiguation and fluency ranking. Given that ...

متن کامل

Discriminative features in reversible stochastic attribute-value grammars

Reversible stochastic attribute-value grammars (de Kok et al., 2011) use one model for parse disambiguation and fluency ranking. Such a model encodes preferences with respect to syntax, fluency, and appropriateness of logical forms, as weighted features. Reversible models are built on the premise that syntactic preferences are shared between parse disambiguation and fluency ranking. Given that ...

متن کامل

Reversible Stochastic Attribute-Value Grammars

An attractive property of attribute-value grammars is their reversibility. Attribute-value grammars are usually coupled with separate statistical components for parse selection and fluency ranking. We propose reversible stochastic attribute-value grammars, in which a single statistical model is employed both for parse selection and fluency ranking.

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010